Skip to content

Fix the usage of meta information #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

artpol84
Copy link

@artpol84 artpol84 commented Jun 5, 2025

Hello,

While reviewing the code, I spotted a potential misuse of the meta-arrays generated during the dispatch phase.
Please find my reasoning below.

Here instead of getting sequential indices of tokens sent to this specific expert, the global token numbers will be obtained.
This will result in tokens written in non-contiguous fashion (instead of being compacted at the beginning of the corresponding buffer segment).

Because the target buffer is always large enough to accommodate all buffers, it will not cause a buffer overflow.
And due to the absence (yet?) of data verification, the issue went unnoticed.

@abcdabcd987 abcdabcd987 requested a review from nandor June 5, 2025 16:29
@artpol84
Copy link
Author

artpol84 commented Jun 5, 2025

BTW, the remote token number (stored in sourceToken) seems to be unused now.
It may be good information for logging/verification/etc., but it shouldn't be needed for data exchange.
IMO expert shouldn't care about the original indexing of tokens on the originating rank. All that's important is the logical order of the incoming tokens as observed by the expert (which should be located in sourceIndex)

@nandor
Copy link
Collaborator

nandor commented Jun 5, 2025

It is used in combine

@artpol84
Copy link
Author

artpol84 commented Jun 6, 2025

Sketches of the current and possible solutions as discussed offline:

image image

@artpol84
Copy link
Author

artpol84 commented Jun 6, 2025

While reviewing the code, I noticed that the kernels copy data from CUDA buffers to NVSHMEM buffers before sending. For example, in dispatch here. This effectively disables zero copy that is possible as NVSHMEM can send from both NVSHMEM and CUDA buffers.

This is required for 2 reasons:

  1. To combine the token and "scale" portion
  2. To piggy-back the token number (here)

In the proposed variant, no piggy-back is required, so if performance-wise it's ok to send the token and it's scaling addition separately, no copy is required.

I want to play with it to see if it gives any improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants